A Bootstrapping Approach to Information Extraction Domain Porting
نویسندگان
چکیده
This paper presents a seed-driven, bootstrapping approach to domain porting that could be used to customize a generic information extraction (IE) capability for a specific domain. The approach taken is based on the existence of a robust, domain-independent IE engine that can continue to be enhanced, independent of any particular domain. This approach combines the strengths of parsing-based symbolic rule learning and the high performance linear string-based Hidden Markov Model (HMM) to automatically derive a customized IE system with balanced precision and recall. The key idea is to apply precision-oriented symbolic rules learned in the first stage to a large corpus in order to construct an automatically tagged training corpus. This training corpus is then used to train an HMM to boost the recall. The experiments conducted in named entity (NE) tagging and relationship extraction show a performance close to the performance of supervised learning systems.
منابع مشابه
Ontology-Based Information Extraction under a Bootstrapping Approach
The authors present an ontology-based information extraction process, which operates in a bootstrapping framework. The novelty of this approach lies in the continuous semantics extraction from textual content in order to evolve the underlying ontology, while the evolved ontology enhances in turn the information extraction mechanism. This process was implemented in the context of the R&D project...
متن کاملBootstrapping an Ontology-based Information Extraction System
Automatic intelligent web exploration will benefit from shallow information extraction techniques if the latter can be brought to work within many different domains. The major bottleneck for this, however, lies in the so far difficult and expensive modeling of lexical knowledge, extraction rules, and an ontology that together define the information extraction system. In this paper we present a ...
متن کاملAn XML-Based Bootstrapping Method for Pattern Acquisition
Extensible Markup Language (XML) has been widely used as a middleware because of its flexibility. Fixed domain is one of the bottlenecks of Information Extraction (IE) technologies. In this paper we present a XML-based domain-adaptable bootstrapping method of pattern acquisition, which focuses on minimizing the cost of domain migration. The approach starts from a seed corpus with some seed patt...
متن کاملLearning Dictionaries for Information Extraction by Multi-Level Bootstrapping
Information extraction systems usually require two dictionaries: a semantic lexicon and a dictionary of extraction patterns for the domain. We present a multilevel bootstrapping algorithm that generates both the semantic lexicon and extraction patterns simultaneously. As input, our technique requires only unannotated training texts and a handful of seed words for a category. We use a mutual boo...
متن کاملBootstrapping Noun Groups Using Closed-Class Elements Only
The identification of noun groups in text is a well researched task and serves as a pre-step for other natural language processing tasks, such as the extraction of keyphrases or technical terms. We present a first version of a noun group chunker that, given an unannotated text corpus, adapts itself to the domain at hand in an unsupervised way. Our approach is inspired by findings from cognitive...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004